-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CMS Forms Flexible Imports #419
CMS Forms Flexible Imports #419
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! few comments
@@ -425,6 +425,41 @@ def test_single_assessment(self, impexp: ImportExport) -> None: | |||
imported = impexp.get_assessment_json() | |||
assert imported == orig | |||
|
|||
def test_snake_case_assessments(self, csv_impexp: ImportExport) -> None: | |||
""" | |||
Importing a csv with spaces in header names and uppercase text should be coverted to snake case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling: "coverted"
model_fields = [field.name for field in assessment_model._meta.get_fields()] | ||
|
||
expected_fields = { | ||
"title": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the True value here not redundant since they're all True? Removing this would make your for loop below less complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that's true.
home/import_assessments.py
Outdated
row_iterator = parse_file(self.file_content, self.file_type) | ||
rows = [row for _, row in row_iterator] | ||
|
||
if rows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look like we have a test covering rows being empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added it in but I also had to modify the helper function here:
contentrepo/home/import_helpers.py
Line 172 in ad6ca7b
first_row = next(rows) |
It would throw a runtime stop-iteration error becuase it will try use next(rows) on an empty iterator since the file is empty.
So I put it in a try-except block to catch the exception if the iterator is empty.
Then in the parse_file function, I added an if not rows to catch empty rows.
home/import_assessments.py
Outdated
@@ -185,6 +214,21 @@ def create_shadow_assessment_from_row( | |||
) | |||
assessment.questions.append(question) | |||
|
|||
def validate_headers( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to pass in MANDATORY_HEADERS here? validate_headers is in the same class where it is called and can also access the variable directly.
home/import_assessments.py
Outdated
original_headers = rows[0].keys() | ||
headers_mapping = { | ||
header: self.to_snake_case(header) for header in original_headers | ||
} | ||
snake_case_headers = list(headers_mapping.values()) | ||
self.validate_headers(snake_case_headers, MANDATORY_HEADERS, row_num=1) | ||
transformed_rows = [ | ||
{headers_mapping[key]: value for key, value in row.items()} for row in rows | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should some of this not maybe live inside the parse_file function? I see we have a function in there called fix_rows?
I'm not very familiar with all the import code so I'm just asking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we can move some there so it's also more accessible to other apps that will share the helper functions. There we can check if there are rows. We could perhaps handle snake_case conversion there too. I'll take a look.
… apps for file imports. - Check empty rows() now actioned for assessment, content pages imports and any other cms imports that uses the parse_file function. - We are not able to move snake_case into fix_rows() in import_helpers.py because contentset uses PascalCase for headers. If we snake_cased every import, there will be key errors in contentsets.
home/import_assessments.py
Outdated
row_num=row_num, | ||
) | ||
|
||
def to_snake_case(self, s: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this being used?
home/import_helpers.py
Outdated
def to_snake_case(s: str) -> str: | ||
""" | ||
Converts string to snake_case. | ||
""" | ||
return re.sub(r"[\W_]+", "_", s).lower().strip("_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have a specific set of characters that we convert, perhaps just spaces and dashes. Anything too far off should be an error to avoid confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do it that way then making it reusable across all imports could be more cumbersome since they all have their various headers?
home/import_helpers.py
Outdated
if any(char in s for char in INVALID_CHARACTERS): | ||
raise ImportException( | ||
f"Invalid header: '{s}' contains invalid characters.", row_num=row_num | ||
) | ||
return re.sub(r"[\W_]+", "_", s).strip("_") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than maintaining a long list of invalid non-word characters (some of which we may potentially want to use in the future) and replacing everything else with _
, we can replace just the _
-equivalent characters:
if any(char in s for char in INVALID_CHARACTERS): | |
raise ImportException( | |
f"Invalid header: '{s}' contains invalid characters.", row_num=row_num | |
) | |
return re.sub(r"[\W_]+", "_", s).strip("_") | |
return s.replace(" ", "_").replace("-", "_").strip("_") |
(I typed this directly into the PR comment, so check that it works before accepting the suggestion.)
home/import_assessments.py
Outdated
@classmethod | ||
def check_missing_fields(cls, row: dict[str, str], row_num: int) -> None: | ||
""" | ||
Checks for missing required fields in the row and raises an exception if any is missing. | ||
""" | ||
missing_fields = [field for field in MANDATORY_HEADERS if field not in row] | ||
if missing_fields: | ||
raise ImportAssessmentException( | ||
f"The import file is missing required fields: {', '.join(missing_fields)}", | ||
row_num, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this already covered by AssessmentImporter.validate_headers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, validate_headers checks for missing headers before writing to rows. Check_missing_fields then checks that the mandatory headers do not have missing fields in its rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All rows will have exactly the same set of keys, so if the headers aren't missing there they won't be missing here. If the intent is to check for missing values, we should change the code to do that instead of just checking the keys.
(Unless we're removing empty fields somewhere along the way and I missed it?)
home/import_assessments.py
Outdated
raise ImportAssessmentException( | ||
"The import file is missing some required fields." | ||
f"The import file is missing required fields: {', '.join(missing_fields)}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps "Row missing values for required fields: {...}"
instead?
Purpose
There is need for CMS forms to have a more robust import mechanism.
Also, we need the error handling to be more intuitive so there's better guidance for the user when there's an import failure
Solution
This pr includes:
A header validator function which validates the header before writing to rows
A list of mandatory headers that each import should have
A snakecase function which converts the headers to snake case so the import is more flexible
What this means is that if a user enters Generic Error instead of generic_error as a header name, it will convert and pass.
Checklist